Distributed asynchronous localization and mapping for augmented reality

ABSTRACT

A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations.

BACKGROUND

Augmented reality (AR) relates to providing an augmented real-worldenvironment where the perception of a real-world environment (or datarepresenting a real-world environment) is augmented or modified withcomputer-generated virtual data. For example, data representing areal-world environment may be captured in real-time using sensory inputdevices such as a camera or microphone and augmented withcomputer-generated virtual data including virtual images and virtualsounds. The virtual data may also include information related to thereal-world environment such as a text description associated with areal-world object in the real-world environment. An AR environment maybe used to enhance numerous applications including video game, mapping,navigation, and mobile device applications.

Some AR environments enable the perception of real-time interactionbetween real objects (i.e., objects existing in a particular real-worldenvironment) and virtual objects (i.e., objects that do not exist in theparticular real-world environment). In order to realistically integratethe virtual objects into an AR environment, an AR system typicallyperforms several steps including mapping and localization. Mappingrelates to the process of generating a map of the real-worldenvironment. Localization relates to the process of locating aparticular point of view or pose relative to the map. A fundamentalrequirement of many AR systems is the ability to localize the pose of amobile device moving within a real-world environment in order todetermine the particular view associated with the mobile device thatneeds to be augmented.

In robotics, traditional methods employing simultaneous localization andmapping (SLAM) techniques have been used by robots and autonomousvehicles in order to build a map of an unknown environment (or to updatea map within a known environment) while simultaneously tracking theircurrent location for navigation purposes. Most SLAM approaches areincremental, meaning that they iteratively update the map and thenupdate the estimated camera pose in the same process. An extension ofSLAM is parallel tracking and mapping (PTAM), which separates themapping and localization steps into parallel computation threads. BothSLAM and PTAM techniques produce sparse point clouds as maps. Sparsepoint clouds may be sufficient for enabling camera localization, but maynot be sufficient for enabling complex augmented reality applicationssuch as those that must handle collisions and occlusions due to theinteraction of real objects and virtual objects. Both SLAM and PTAMtechniques utilize a common sensing source for both the mapping andlocalization steps.

SUMMARY

Technology is described for providing an augmented reality environmentin which the environmental mapping process is decoupled from thelocalization processes performed by one or more mobile devices. In someembodiments, an augmented reality system includes a mapping system withindependent sensing devices for mapping a particular real-worldenvironment and one or more mobile devices. Each of the one or moremobile devices utilizes a separate asynchronous computing pipeline forlocalizing the mobile device and rendering virtual objects from a pointof view of the mobile device. This distributed approach provides anefficient way for supporting mapping and localization processes for alarge number of mobile devices, which are typically constrained by formfactor and battery life limitations.

One embodiment includes determining whether a first map is required,acquiring the first map, storing the first map on a mobile device,determining a first pose associated with the mobile device, determiningwhether a second map is required including determining whether a virtualobject is located within a field of view associated with the first pose,acquiring the second map, storing the second map on the mobile device,registering a virtual object in relation to the second map, renderingthe virtual object, and displaying on the mobile device a virtual imageassociated with the virtual object corresponding with a view of thevirtual object such that the virtual object is perceived to exist withinthe field of view associated with the first pose.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of one embodiment of a networked computingenvironment in which the disclosed technology may be practiced.

FIG. 2 depicts one embodiment of a portion of an HMD.

FIG. 3A depicts one embodiment of a field of view as seen by a userwearing a HMD.

FIG. 3B depicts one embodiment of an AR environment.

FIG. 3C depicts one embodiment of an AR environment.

FIG. 4 illustrates one embodiment of a mapping system.

FIG. 5 depicts one embodiment of an AR system.

FIG. 6A is a flowchart describing one embodiment of a process forgenerating a 3-D map of a real-world environment and locating virtualobjects within the 3-D map.

FIG. 6B is a flowchart describing one embodiment of a process forupdating a 3-D map of a real-world environment.

FIG. 7 is a flowchart describing one embodiment of a process forrendering and displaying virtual objects.

FIG. 8 is a block diagram of an embodiment of a gaming and media system.

FIG. 9 is a block diagram of one embodiment of a mobile device.

FIG. 10 is a block diagram of an embodiment of a computing systemenvironment.

DETAILED DESCRIPTION

Technology is described for providing an augmented reality environmentin which the environmental mapping process is decoupled from thelocalization processes performed by one or more mobile devices. In someembodiments, an augmented reality system includes a mapping system withindependent sensing devices for mapping a particular real-worldenvironment and one or more mobile devices. Each of the one or moremobile devices utilizes a separate asynchronous computing pipeline forlocalizing the mobile device and rendering virtual objects from a pointof view of the mobile device. This distributed approach provides anefficient way for supporting mapping and localization processes for alarge number of mobile devices, which are typically constrained by formfactor and battery life limitations.

An AR system may allow a user of a mobile device to view augmentedimages in real-time as the user moves about within a particularreal-world environment. In order for the AR system to enable theperceived interaction between the particular real-world environment andvirtual objects within the particular real-world environment, severalproblems must be solved. First, the AR system must generate a map of theparticular real-world environment in which the virtual objects are toappear and interact (i.e., perform a mapping step). Second, the ARsystem must determine a particular pose associated with the mobiledevice (i.e., perform a localization step). Finally, the virtual objectsthat are to appear within the field of view of the particular pose mustbe registered or aligned with a coordinate system associated with theparticular real-world environment. Solving these problems takesconsiderable processing power and providing the computer hardwarenecessary to accomplish real-time AR is especially difficult because ofform factor and power limitations imposed on mobile devices.

In general, solving the mapping problem is computationally much moredifficult than the localization problem and may require morecomputational power as well as more sophisticated sensors for increasedrobustness and accuracy. However, the mapping problem need not be solvedin real-time if landmarks or other real objects within the real-worldenvironment are static (i.e., do not move) or semi-static (i.e., do notmove often). On the other hand, mobile device localization does need tobe performed in real-time in order enable the perception of real-timeinteraction between real objects and virtual objects.

FIG. 1 is a block diagram of one embodiment of a networked computingenvironment 100 in which the disclosed technology may be practiced.Networked computing environment 100 includes a plurality of computingdevices interconnected through one or more networks 180. The one or morenetworks 180 allow a particular computing device to connect to andcommunicate with another computing device. The depicted computingdevices include mobile device 140, mobile devices 110 and 120, desktopcomputer 130, and mapping server 150. In some embodiments, the pluralityof computing devices may include other computing devices not shown. Insome embodiments, the plurality of computing devices may include morethan or less than the number of computing devices shown in FIG. 1. Theone or more networks 180 may include a secure network such as anenterprise private network, an unsecure network such as a wireless opennetwork, a local area network (LAN), a wide area network (WAN), and theInternet. Each network of the one or more networks 180 may include hubs,bridges, routers, switches, and wired transmission media such as a wirednetwork or direct-wired connection.

A server, such as mapping server 150, may allow a client to downloadinformation (e.g., text, audio, image, and video files) from the serveror to perform a search query related to particular information stored onthe server. In general, a “server” may include a hardware device thatacts as the host in a client-server relationship or a software processthat shares a resource with or performs work for one or more clients.Communication between computing devices in a client-server relationshipmay be initiated by a client sending a request to the server asking foraccess to a particular resource or for particular work to be performed.The server may subsequently perform the actions requested and send aresponse back to the client.

One embodiment of mobile device 140 includes a camera 148, microphone149, network interface 145, processor 146, and memory 147, all incommunication with each other. Camera 148 may capture digital imagesand/or videos. Microphone 149 may capture sounds. Network interface 145allows mobile device 140 to connect to one or more networks 180. Networkinterface 145 may include a wireless network interface, a modem, and/ora wired network interface. Processor 146 allows mobile device 140 toexecute computer readable instructions stored in memory 147 in order toperform processes discussed herein.

Networked computing environment 100 may provide a cloud computingenvironment for one or more computing devices. Cloud computing refers toInternet-based computing, wherein shared resources, software, and/orinformation are provided to one or more computing devices on-demand viathe Internet (or other global network). The term “cloud” is used as ametaphor for the Internet, based on the cloud drawings used in computernetwork diagrams to depict the Internet as an abstraction of theunderlying infrastructure it represents.

In one example of an AR environment, one or more users may move around areal-world environment (e.g., a living room) each wearing specialglasses, such as mobile device 140, that allow the one or more users toobserve views of the real-world overlaid with virtual images of virtualobjects that maintain coherent spatial relationship with the real-worldenvironment (i.e., as a particular user turns their head or moves withinthe real-world environment, the virtual images displayed to theparticular user will change such that the virtual objects appear toexist within the real-world environment as perceived by the particularuser). In one embodiment, environmental mapping of the real-worldenvironment is performed by mapping server 150 (i.e., on the serverside) while camera localization is performed on mobile device 140 (i.e.,on the client side). The one or more users may also perceive 3-Dlocalized virtual surround sounds that match the general acoustics ofthe real-world environment and that seem fixed in space with respect tothe real-world environment.

In another example, live video images captured using a video camera on amobile device, such as mobile device 120, may be augmented withcomputer-generated images of a virtual monster. The resulting augmentedvideo images may then be displayed on a display of the mobile device inreal-time such that an end user of the mobile device sees the virtualmonster interacting with the real-world environment captured by themobile device. In another example, the perception of a real-worldenvironment may be augmented via a head-mounted display device (HMD),which may comprise a video see-through and/or an optical see-throughsystem. Mobile device 140 is one example of an optical see-through HMD.An optical see-through HMD worn by an end user may allow actual directviewing of a real-world environment (e.g., via transparent lenses) andmay, at the same time, project images of a virtual object into thevisual field of the end user thereby augmenting the real-worldenvironment perceived by the end user with the virtual object.

FIG. 2 depicts one embodiment of a portion of an HMD, such as mobiledevice 140 in FIG. 1. Only the right side of a head-mounted device isdepicted. HMD 200 includes right temple 202, nose bridge 204, eye glass216, and eye glass frame 214. Built into nose bridge 204 is a microphone210 for recording sounds and transmitting the audio recording toprocessing unit 236. A front facing camera 213 is embedded inside righttemple 202 for recording digital images and/or videos and transmittingthe visual recordings to processing unit 236. Front facing camera 213may capture color information, IR information, and/or depth information.Microphone 210 and front facing camera 213 are in communication withprocessing unit 236.

Also embedded inside right temple 202 are ear phones 230, motion andorientation sensor 238, GPS receiver 232, power supply 239, and wirelessinterface 237, all in communication with processing unit 236. Motion andorientation sensor 238 may include a three axis magnetometer, a threeaxis gyro, and/or a three axis accelerometer. In one embodiment, themotion and orientation sensor 238 may comprise an inertial measurementunit (IMU). The GPS receiver may determine a GPS location associatedwith HMD 200. Processing unit 236 may include one or more processors anda memory for storing computer readable instructions to be executed onthe one or more processors. The memory may also store other types ofdata to be executed on the one or more processors.

In one embodiment, eye glass 216 may comprise a see-through display,whereby virtual images generated by processing unit 236 may be projectedand/or displayed on the see-through display. The front facing camera 213may be calibrated such that the field of view captured by the frontfacing camera 213 corresponds with the field of view as seen by a userof HMD 200. The ear phones 230 may be used to output virtual soundsassociated with the virtual images. In some embodiments, HMD 200 mayinclude two or more front facing cameras (e.g., one on each temple) inorder to obtain depth from stereo information associated with the fieldof view captured by the front facing cameras. The two or more frontfacing cameras may also comprise 3-D, IR, and/or RGB cameras. Depthinformation may also be acquired from a single camera utilizing depthfrom motion techniques. For example, two images may be acquired from thesingle camera associated with two different points in space at differentpoints in time. Parallax calculations may then be performed givenposition information regarding the two different points in space.

FIG. 3A depicts one embodiment of a field of view as seen by a userwearing a HMD such as mobile device 140 in FIG. 1. The user may seewithin the field of view both real objects and virtual objects. The realobjects may include a chair 16 and a mapping system 10 (e.g., comprisinga portion of an entertainment system). The virtual objects may include avirtual monster 17. As the virtual monster 17 is displayed or overlaidover the real-world environment as perceived through the see-throughlenses of the HMD, the user may perceive that the virtual monster 17exists within the real-world environment.

FIG. 3B depicts one embodiment of an AR environment. The AR environmentincludes a mapping system 10 and mobile devices 18 and 19. The mobiledevices 18 and 19, depicted as special glasses in FIG. 3B, may include aHMD such as HMD 200 in FIG. 2. The mapping system 10 may include acomputing environment 12, a capture device 20, and a display 14, all incommunication with each other. Computing environment 12 may include oneor more processors. Capture device 20 may include a color or depthsensing camera that may be used to visually monitor one or more targetsincluding humans and one or more other objects within a particularenvironment. In one example, capture device 20 may comprise an RGB ordepth camera and computing environment 12 may comprise a set-top box orgaming console. Mapping system 10 may support multiple mobile devices orclients.

The mobile devices 18 and 19 may include HMDs. As shown in FIG. 3B, user28 wears mobile device 18 and user 29 wears mobile device 19. The mobiledevices 18 and 19 may receive virtual data from mapping system 10 suchthat a virtual object is perceived to exist within a field of view asdisplayed through the respective mobile device. For example, as seen byuser 28 through mobile device 18, the virtual object is displayed as theback of virtual monster 17. As seen by user 29 through mobile device 19,the virtual object is displayed as the front of virtual monster 17appearing above the back of chair 16. The rendering of virtual monster17 may be performed by mapping system 10 or by mobile devices 18 and 19.In one embodiment, mapping system 10 renders images of virtual monster17 associated with a field of view of a particular mobile device andtransmits the rendered images to the particular mobile device.

FIG. 3C depicts one embodiment of an AR environment utilizing themapping system 10 and mobile devices 18 and 19 depicted in FIG. 3B. Themapping system 10 may track and analyze virtual objects within aparticular environment such as virtual ball 27. The mapping system 10may also track and analyze real objects within the particularenvironment such as user 28, user 29, and chair 16. The rendering ofimages associated with virtual ball 27 may be performed by mappingsystem 10 or by mobile devices 18 and 19.

In one embodiment, mapping system 10 tracks the position of virtualobjects by taking into consideration the interaction between real andvirtual objects. For example, user 18 may move their arm such that user18 perceives hitting virtual ball 27. The mapping system 10 maysubsequently apply a virtual force to virtual ball 27 such that bothusers 28 and 29 perceive that the virtual ball has been hit by user 28.In one example, mapping system 10 may register the placement of virtualball 27 within a 3-D map of the particular environment and providevirtual data information to mobile devices 18 and 19 such that users 28and 29 perceive the virtual ball 27 as existing within the particularenvironment from their respective points of view. In another embodiment,a particular mobile device may render virtual objects that are specificto the particular mobile device. For example, if the virtual ball 27 isonly rendered on mobile device 18 then the virtual ball 27 would only beperceived as existing within the particular environment by user 28. Insome embodiments, the dynamics of virtual objects may be performed onthe particular mobile device and not on the mapping system.

FIG. 4 illustrates one embodiment of a mapping system 50 including acapture device 58 and computing environment 54. Mapping system 50 is oneexample of an implementation for mapping system 10 in FIGS. 3B-3C. Forexample, computing environment 54 may correspond with computingenvironment 12 in FIGS. 3B-3C and capture device 58 may correspond withcapture device 20 in FIGS. 3B-3C.

In one embodiment, the capture device 58 may include one or more imagesensors for capturing images and videos. An image sensor may comprise aCCD image sensor or a CMOS sensor. In some embodiments, capture device58 may include an IR CMOS image sensor. The capture device 58 may alsoinclude a depth camera (or depth sensing camera) configured to capturevideo with depth information including a depth image that may includedepth values via any suitable technique including, for example,time-of-flight, structured light, stereo image, or the like.

The capture device 58 may include an image camera component 32. In oneembodiment, the image camera component 32 may include a depth camerathat may capture a depth image of a scene. The depth image may include atwo-dimensional (2-D) pixel area of the captured scene where each pixelin the 2-D pixel area may represent a depth value such as a distance in,for example, centimeters, millimeters, or the like of an object in thecaptured scene from the camera.

The image camera component 32 may include an IR light component 34, athree-dimensional (3-D) camera 36, and an RGB camera 38 that may be usedto capture the depth image of a capture area. For example, intime-of-flight analysis, the IR light component 34 of the capture device58 may emit an infrared light onto the capture area and may then usesensors to detect the backscattered light from the surface of one ormore objects in the capture area using, for example, the 3-D camera 36and/or the RGB camera 38. In some embodiments, pulsed infrared light maybe used such that the time between an outgoing light pulse and acorresponding incoming light pulse may be measured and used to determinea physical distance from the capture device 58 to a particular locationon the one or more objects in the capture area. Additionally, the phaseof the outgoing light wave may be compared to the phase of the incominglight wave to determine a phase shift. The phase shift may then be usedto determine a physical distance from the capture device to a particularlocation associated with the one or more objects.

In another example, the capture device 58 may use structured light tocapture depth information. In such an analysis, patterned light (i.e.,light displayed as a known pattern such as grid pattern or a stripepattern) may be projected onto the capture area via, for example, the IRlight component 34. Upon striking the surface of one or more objects (ortargets) in the capture area, the pattern may become deformed inresponse. Such a deformation of the pattern may be captured by, forexample, the 3-D camera 36 and/or the RGB camera 38 and analyzed todetermine a physical distance from the capture device to a particularlocation on the one or more objects.

In some embodiments, two or more different cameras may be incorporatedinto an integrated capture device. For example, a depth camera and avideo camera (e.g., an RGB video camera) may be incorporated into acommon capture device. In some embodiments, two or more separate capturedevices of the same or differing types may be cooperatively used. Forexample, a depth camera and a separate video camera may be used, twovideo cameras may be used, two depth cameras may be used, two RGBcameras may be used or any combination and number of cameras may beused. In one embodiment, the capture device 58 may include two or morephysically separated cameras that may view a capture area from differentangles to obtain visual stereo data that may be resolved to generatedepth information. Depth may also be determined by capturing imagesusing a plurality of detectors that may be monochromatic, infrared, RGB,or any other type of detector and performing a parallax calculation.Other types of depth image sensors can also be used to create a depthimage.

As shown in FIG. 4, capture device 58 may include a microphone 40. Themicrophone 40 may include a transducer or sensor that may receive andconvert sound into an electrical signal.

The capture device 58 may include a processor 42 that may be inoperative communication with the image camera component 32. Theprocessor may include a standardized processor, a specialized processor,a microprocessor, or the like. The processor 42 may execute instructionsthat may include instructions for storing filters or profiles, receivingand analyzing images, determining whether a particular situation hasoccurred, or any other suitable instructions. It is to be understoodthat at least some image analysis and/or target analysis and trackingoperations may be executed by processors contained within one or morecapture devices such as capture device 58.

The capture device 58 may include a memory 44 that may store theinstructions that may be executed by the processor 42, images or framesof images captured by the 3-D camera or RGB camera, filters or profiles,or any other suitable information, images, or the like. In one example,the memory 44 may include random access memory (RAM), read only memory(ROM), cache, Flash memory, a hard disk, or any other suitable storagecomponent. As shown in FIG. 4, the memory 44 may be a separate componentin communication with the image capture component 32 and the processor42. In another embodiment, the memory 44 may be integrated into theprocessor 42 and/or the image capture component 32. In otherembodiments, some or all of the components 32, 34, 36, 38, 40, 42 and 44of the capture device 58 illustrated in FIG. 4 are housed in a singlehousing.

The capture device 58 may be in communication with the computingenvironment 54 via a communication link 46. The communication link 46may be a wired connection including, for example, a USB connection, aFireWire connection, an Ethernet cable connection, or the like and/or awireless connection such as a wireless 802.11b, g, a, or n connection.The computing environment 54 may provide a clock to the capture device58 that may be used to determine when to capture, for example, a scenevia the communication link 46. In one embodiment, the capture device 58may provide the images captured by, for example, the 3D camera 36 and/orthe RGB camera 38 to the computing environment 54 via the communicationlink 46.

As shown in FIG. 4, computing environment 54 includes image and audioprocessing engine 194 in communication with operating system 196. Imageand audio processing engine 194 includes virtual data engine 197,gesture recognizer engine 190, structure data 198, processing unit 191,and memory unit 192, all in communication with each other. Image andaudio processing engine 194 processes video, image, and audio datareceived from capture device 58. To assist in the detection and/ortracking of objects, image and audio processing engine 194 may utilizestructure data 198 and gesture recognition engine 190. Virtual dataengine 197 processes virtual objects and registers the position andorientation of virtual objects in relation to various maps of areal-world environment stored in memory unit 192.

Processing unit 191 may include one or more processors for executingobject, facial, and voice recognition algorithms. In one embodiment,image and audio processing engine 194 may apply object recognition andfacial recognition techniques to image or video data. For example,object recognition may be used to detect particular objects (e.g.,soccer balls, cars, or landmarks) and facial recognition may be used todetect the face of a particular person. Image and audio processingengine 194 may apply audio and voice recognition techniques to audiodata. For example, audio recognition may be used to detect a particularsound. The particular faces, voices, sounds, and objects to be detectedmay be stored in one or more memories contained in memory unit 192.

In some embodiments, one or more objects being tracked may be augmentedwith one or more markers such as an IR retroreflective marker to improveobject detection and/or tracking. Planar reference images, coded ARmarkers, QR codes, and/or bar codes may also be used to improve objectdetection and/or tracking. Upon detection of one or more objects, imageand audio processing engine 194 may report to operating system 196 anidentification of each object detected and a corresponding positionand/or orientation.

The image and audio processing engine 194 may utilize structural data198 while performing object recognition. Structure data 198 may includestructural information about targets and/or objects to be tracked. Forexample, a skeletal model of a human may be stored to help recognizebody parts. In another example, structure data 198 may includestructural information regarding one or more inanimate objects in orderto help recognize the one or more inanimate objects.

The image and audio processing engine 194 may also utilize gesturerecognizer engine 190 while performing object recognition. In oneexample, gestures recognizer engine 190 may include a collection ofgesture filters, each comprising information concerning a gesture thatmay be performed by a skeletal model. The gesture recognition engine 190may compare the data captured by capture device 58 in the form of theskeletal model and movements associated with it to the gesture filtersin a gesture library to identify when a user (as represented by theskeletal model) has performed one or more gestures. In one example,image and audio processing engine 194 may use the gesture recognitionengine 190 to help interpret movements of a skeletal model and to detectthe performance of a particular gesture.

More information about the detection and tracking of objects can befound in U.S. patent application Ser. No. 12/641,788, “Motion DetectionUsing Depth Images,” filed on Dec. 18, 2009; and U.S. patent applicationSer. No. 12/475,308, “Device for Identifying and Tracking MultipleHumans over Time,” both of which are incorporated herein by reference intheir entirety. More information about gesture recognizer engine 190 canbe found in U.S. patent application Ser. No. 12/422,661, “GestureRecognizer System Architecture,” filed on Apr. 13, 2009, incorporatedherein by reference in its entirety. More information about recognizinggestures can be found in U.S. patent application Pat. No. 12/391,150,“Standard Gestures,” filed on Feb. 23, 2009; and U.S. patent applicationSer. No. 12/474,655, “Gesture Tool,” filed on May 29, 2009, both ofwhich are incorporated by reference herein in their entirety.

FIG. 5 depicts one embodiment of an AR system 600. AR system 600includes mapping server 620 and mobile device 630. Mapping server 620may comprise a mapping system such as mapping system 50 in FIG. 4.Mapping server 620 may work asynchronously to build one or more mapsassociated with one or more real-world environments. The one or moremaps may include sparse 3-D maps and/or dense 3-D maps based on imagescaptured from a variety of sources including sensors dedicated tomapping server 620. Mobile device 630 may comprise an HMD such as mobiledevice 140 in FIG. 1. Although a single mobile device 630 is depicted,mapping server 620 may support numerous mobile devices.

The process steps 621-624 associated with mapping server 620 aredecoupled from and are performed independently of the process steps631-635 associated with the mobile device 630. In step 621, images arereceived from one or more server sensors. One example of the one or moreserver sensors includes the 3-D and/or RGB cameras within the imagecamera component 32 in FIG. 4. The one or more server sensors may beassociated with one or more stationary cameras and/or one or more mobilecameras which may move about a particular environment over time. The oneor more mobile cameras may move in a predetermined or predictable mannerabout the particular environment (e.g., a dedicated mapping servercamera that runs on tracks above a room or hangs from the ceiling of aroom and rotates in a deterministic manner). In one embodiment, the oneor more server sensors providing images to the mapping server 620 maymove about an environment attached to a mobile robot or autonomousvehicle.

In step 622, images received from the one or more server sensors areregistered. During image registration, the mapping server 620 mayregister different images taken within a particular environment (e.g.,images from different points of view, taken at different points in time,and/or images associated with different types of information such ascolor or depth information) into a single real-world coordinate systemassociated with the particular environment. In one example, imageregistration into a common coordinate system may be performed using anextrinsic calibration process. The registration and/or alignment ofimages (or objects within the images) onto a common coordinate systemallows the mapping server 620 to be able to compare and integratereal-world objects, landmarks, or other features extracted from thedifferent images into one or more maps associated with the particularenvironment. The one or more maps associated with a particularenvironment may comprise 3-D maps.

More information about generating 3-D maps can be found in U.S. patentapplication Ser. No. 13/017,690, “Three-Dimensional EnvironmentReconstruction,” incorporated herein by reference in its entirety.

In one embodiment, one or more additional images derived from sourcesdecoupled from mapping server 620 may be used to extend or update theone or more maps associated with the particular environment. Forexample, client images 640 derived from sensors associated with mobiledevice 630 may be used by mapping server 620 to refine the one or moremaps.

In some embodiments, the registration of one or more images may requireknowledge of one or more poses associated with the one or more images.For example, for each image being registered, a six degree of freedom(6DOF) pose may be provided including information associated with theposition and orientation of the particular sensor from which the imagewas captured. Mapping and pose estimation (or localization) associatedwith a particular sensor may be determined using traditional SLAM orPTAM techniques.

In step 623, a sparse map representing a portion of a particularenvironment is generated. The sparse map may comprise a 3-D point cloudand may include one or more image descriptors. The mapping server 620may identify one or more image descriptors associated with a particularmap of the one or more maps by applying various image processing methodssuch as object recognition, feature detection, corner detection, blobdetection, and edge detection methods. The one or more image descriptorsmay be used as landmarks in determining a particular pose, position,and/or orientation in relation to the particular map. An imagedescriptor may include color and/or depth information associated with aparticular object (e.g., a red apple) or a portion of a particularobject within the particular environment (e.g., the top of a red apple).In some embodiments, an initial sparse map is generated first and thensubsequently refined and updated over time. The initial sparse map maybe limited to only covering static objects within the particularenvironment.

In step 624, a dense map representing a portion of the particularenvironment is generated. The dense map may comprise a dense 3-D surfacemesh, which may be created using the 3-D point cloud generated in step623. The dense map may provide sufficient detail of the particularenvironment so as to enable complex augmented reality applications suchas those that must handle collisions and occlusions due to theinteraction of real objects and virtual objects within the particularenvironment. In one embodiment, a particular dense map is not generatedby mapping server 620 until a request for the particular dense map isreceived from mobile device 630. In another embodiment, a mobile device630 may request a particular dense map first without requesting arelated sparse map. In some embodiments, the process steps 621-624associated with mapping server 620 may be performed in the cloud.

As mobile device 630 moves around a particular environment it mayrequest mapping data 641 and/or mapping data 642 associated with aparticular environment. The decision to request either mapping data 641or mapping data 642 depends on the requirements of a particular ARapplication running on mobile device 630. In step 631, one or moreclient images are received from one or more client sensors. One exampleof the one or more client sensors includes the 3-D and/or RGB camerasassociated with HMD 200 in FIG. 2. In one embodiment, the one or moreclient sensors detect light from a patterned or structured light sourceassociated with and projected by mapping server 620. This allows severalmobile devices to each independently utilize the same structured lightsource for localization purposes.

In step 632, one or more client image descriptors are extracted from theone or more client images. The mobile device 630 may identify one ormore client image descriptors within the one or more client images byapplying various image processing methods such as object recognition,feature detection, corner detection, blob detection, and edge detectionmethods. In step 633, a particular map is acquired from mapping server620. The particular map may be included within mapping data 641. The mapmay be a 3-D map and may include one or more image descriptorsassociated with one or more objects located within the particularenvironment described by the 3-D map. An updated map may be requested bymobile device 630 every time the mobile device enters a new portion ofthe particular environment or at a predefined frequency (e.g., every 10minutes).

In order to augment a real-world environment as observed from the pointof view of mobile device 630, the pose associated with a particularfield of view of mobile device 630 in relation to the real-worldenvironment must be determined. In step 634, a pose (e.g., a 6DOF pose)associated with a field of view of mobile device 630 is determined vialocalization and/or pose estimation. Pose estimation may include thematching and aligning of detected image descriptors. For example,matching may be performed between the one or more client imagedescriptors and the image descriptors received from mapping server 620included within mapping data 641. In some embodiments, matching may takeinto consideration depth and color information associated with the oneor more image descriptors. In other embodiments, the one or more clientimages comprise color images (e.g., RGB images) and matching isperformed for only image descriptors including color information,thereby enabling image-based localization for mobile devices with onlyRGB sensor input. Alignment of matched image descriptors may beperformed using image or frame alignment techniques, which may includedetermining point to point correspondences between the field of view ofmobile device 630 and views derived from the particular map.

In some embodiments, mobile device 630 may include a GPS receiver, suchas GPS receiver 232 in FIG. 2, and a motion and orientation sensor, suchas motion and orientation sensor 238 in FIG. 2. Prior to imageprocessing, a first-pass estimate for the pose associated with mobiledevice 630 may be obtained by utilizing GPS location information andorientation information generated on mobile device 630. Mobile device630 may also obtain location and orientation information by detecting,tracking, and triangulating the position of physical tags (e.g.,reflective markers) or emitters (e.g., LEDs) attached to one or moreother mobile devices using marker-based motion capture technologies. Thelocation and/or orientation information associated with mobile device630 may be used to improve the accuracy of the localization step.

In one embodiment, localization of mobile device 630 includes searchingfor image descriptors associated with landmarks or other objects withina field of view of mobile device 630. The extracted image descriptorsmay subsequently be matched with image descriptors associated with themost recent map of the particular environment acquired from mappingserver 620 in step 623. Landmarks may be associated with image featuresthat are easily observed and distinguished from other features withinthe field of view. Stationary landmarks may act as anchor points orreference points in a 3-D map of a real-world environment. Other pointsof interest within a real-world environment (e.g., geographical elementssuch as land features) may also be used for both mapping andlocalization purposes. Landmarks and other points of interest within thereal-world environment may be detected using various image processingmethods such as object recognition, frame alignment, feature detection,corner detection, blob detection, and edge detection methods. As mobiledevice 630 moves around within an environment, mobile device 630 maycontinuously search for and extract landmarks and associate newlydiscovered landmarks to those found previously.

For each object within an environment, there may be one or more imagedescriptors associated with the object that can be extracted and used toidentify the object. An image descriptor may comprise image informationrelated to a portion of an object or to the entire object. In oneexample, an image descriptor may describe characteristics of the objectsuch as its location, color, texture, shape, and/or its relationship toother objects or landmarks within the environment. Utilizing imageprocessing techniques such as object and pattern matching, the one ormore image descriptors may be used to locate an object in an imagecontaining other objects. It is desirable that the image processingtechniques for detecting and matching the one or more image descriptorsbe robust to changes in image scale, noise, illumination, localgeometric distortion, and image orientation.

In some embodiments, localization of mobile device 630 may be performedin the cloud (e.g., by the mapping server 620). For example, the mobiledevice 630 may transmit client images 640 derived from sensorsassociated with mobile device 630 and/or image descriptors 644 extractedin step 632 to the mapping server 620. The mapping server 620 mayperform the mobile device localization based on the received clientimages 640 and/or image descriptors 644 utilizing process steps similarto those performed by mobile device 630 in step 634. Subsequently,mobile device 630 may acquire a pose associated with a field of view ofmobile device 630 from the mapping server 620 without needing to acquirethe particular map in step 633 or perform localization step 634 (i.e.,steps 633 and 634 may be omitted if localization of the mobile device isperformed by the mapping server).

More information regarding performing pose estimation and/orlocalization for a mobile device can be found in U.S. patent applicationSer. No. 13/017,474, “Mobile Camera Localization Using Depth Maps,”incorporated herein by reference in its entirety.

More information regarding differentiating objects within an environmentcan be found in U.S. patent application Ser. No. 13/017,626, “MovingObject Segmentation Using Depth Images,” incorporated herein byreference in its entirety.

Referring to FIG. 5, in step 635, one or more virtual objects aredisplayed on mobile device 630. The one or more virtual objects may berendered on mobile device 630 using a processing element, such asprocessing unit 236 in FIG. 2. Rendering may involve the use of raytracing techniques (e.g., simulating the propagation of light and soundwaves through a model of the AR environment) in order to generatevirtual images and/or sounds from the perspective of a mobile device.Virtual data associated with the one or more virtual objects may begenerated by either the mapping server 620 and/or mobile device 630.

In one embodiment, the one or more virtual objects are rendered locallyon mobile device 630. In another embodiment, the one or more virtualobjects are rendered on mapping server 620 and virtual imagescorresponding with the determined pose in step 634 may be transmitted tomobile device 630. Virtual images associated with the one or morevirtual objects may be displayed on mobile device 630 such that the oneor more virtual objects are perceived to exist within a field of viewassociated with the previously determined pose of step 634. The virtualdata associated with the one or more virtual object may also includevirtual sounds. In some embodiments, mobile device 630 may request adense 3-D map of a particular environment in order to realisticallyrender virtual objects and to handle collisions and occlusions due tothe interaction of real objects and virtual objects within theparticular environment. Virtual sounds may also be registered inrelation to the dense 3-D map. The dense 3-D map may be acquired frommapping server 620 and included within mapping data 642.

More information regarding AR systems and environments can be found inU.S. patent application Ser. No. 12/912,937, “Low-Latency Fusing ofVirtual and Real Content,” incorporated herein by reference in itsentirety.

More information regarding generating virtual sounds in an ARenvironment can be found in U.S. patent application Ser. No. 12/903,610,“System and Method for High-Precision 3-Dimensional Audio for AugmentedReality,” incorporated herein by reference in its entirety.

FIG. 6A is a flowchart describing one embodiment of a process forgenerating a 3-D map of a real-world environment and locating virtualobjects within the 3-D map. The process of FIG. 6A may be performedcontinuously and by one or more computing devices. Each step in theprocess of FIG. 6A may be performed by the same or different computingdevices as those used in other steps, and each step need not necessarilybe performed by a single computing device. In one embodiment, theprocess of FIG. 6A is performed by a mapping server such as mappingserver 620 in FIG. 5.

In step 702, a 3-D map of a real-world environment is generated. In step704, one or more virtual objects may be computer generated. In oneexample, the 3-D map and the one or more virtual objects are generatedusing a mapping server. The one or more virtual objects may be stored ina database on the mapping server. In step 706, the one or more virtualobjects may be registered in relation to the 3-D map. If registration isperformed by the mapping server, then the mapping server may control thelocation of the virtual objects being displayed by one or more mobiledevices.

In step 708, a determination is made as to whether or not the locationof one or more virtual objects in relation to the 3-D map has changed.The change in location may be due to a perceived interaction between oneor more virtual objects and/or one or more real objects, or due to thenatural movement of a virtual object through an environment (e.g., viaphysics simulation). If the location of one or more virtual objects haschanged, then the one or more virtual objects are reregistered inrelation to the 3-D map in step 706. If the one or more virtual objectsare all characterized as static then step 708 may be omitted.

In step 710, location information associated with one or more mobiledevices is received. The received location information may be used by amapping server to customize a particular map associated with thelocation information. In step 712, at least a portion of the 3-D map isoutputted to the one or more mobile devices. The 3-D map may include acustomized map of an environment associated with the locationinformation received in step 710. The 3-D map may include one or moreimage descriptors. The one or more image descriptors may include RGBimage descriptors and/or 3-D image descriptors. In one embodiment, amapping server outputs one of more image descriptors in response to arequest from a mobile device for either RGB image descriptors or 3-Dimage descriptors depending on the requirements of a particularapplication running on the mobile device. In step 714, virtual dataassociated with the one or more virtual objects is outputted to the oneor more mobile devices.

FIG. 6B is a flowchart describing one embodiment of a process forupdating a 3-D map of a real-world environment. The process of FIG. 6Bmay be performed continuously and by one or more computing devices. Eachstep in the process of FIG. 6B may be performed by the same or differentcomputing devices as those used in other steps, and each step need notnecessarily be performed by a single computing device. In oneembodiment, the process of FIG. 6B is performed by a mapping server suchas mapping server 620 in FIG. 5.

In step 802, one or more images of a first environment is acquired fromone or more cameras associated with a mapping server. In step 804, theone or more images are registered. In step 806, a first map of the firstenvironment is generated. In step 808, one or more objects existingwithin the first environment are detected and characterized. Each of theone or more objects may be characterized as either static, semi-static,or dynamic. An object characterized as static is one which does notmove. An object characterized as semi-static is one which does not moveoften. Semi-static objects may be useful for mapping and/or localizationpurposes because they may act as static landmarks over a period of timewithin an environment. Object recognition techniques may be used toidentify objects, such as furniture or appliances, that may typically beclassified as semi-static objects. In one embodiment, maps of anenvironment are stored and time stamped (e.g., every 2 hours) in orderto detect movement of landmarks and to possibly recharacterize one ormore objects. In some embodiments, mapping and localization areperformed using only objects classified as static objects, therebydiscarding any objects characterized as either semi-static or dynamic(i.e., not taking into consideration any moving objects).

In step 810, one or more additional images of the first environment arereceived from a first mobile device. Localization information associatedwith the one or more additional images may also be received from thefirst mobile device. Obtaining pose information along with one or moreadditional images allows a mapping server to generate one or more mapsassociated with an environment beyond that capable of being sensed bydedicated mapping server sensors. The one or more additional images mayalso include depth information. In step 812, the one or more additionalimages are registered. In some embodiments, the one or more additionalimages are registered by a mapping server. A mapping server may alsoaccept information from one or more mobile devices in the form ofregistered RGB data and/or depth reconstructions. In step 814, the firstmap is updated based on the one or more additional images. In step 816,one or more new objects existing within the first environment aredetected and characterized. In step 818, at least a subset of theupdated first map is transmitted to a second mobile device. In oneembodiment, a first set of mobile devices may be used to update thefirst map, while a second set of mobile devices perform localization inrelation to the updated first map.

FIG. 7 is a flowchart describing one embodiment of a process forrendering and displaying virtual objects. The process of FIG. 7 may beperformed continuously and by one or more computing devices. Each stepin the process of FIG. 7 may be performed by the same or differentcomputing devices as those used in other steps, and each step need notnecessarily be performed by a single computing device. In oneembodiment, the process of FIG. 7 is performed by a mobile device suchas mobile device 630 in FIG. 5.

In step 902, it is determined whether a first map is required. In oneexample, a mobile device may determine whether a first map is requiredfor localization purposes by determining a location of the mobile device(e.g., by acquiring a GPS location) and comparing the location with thelocations associated with one or more maps stored on the mobile device.A first map may be required if none of the one or more maps stored onthe mobile device provide adequate coverage of a particular environmentor no map is deemed to be a valid map of the environment encompassingthe location. A particular map stored on a mobile device may be deemedinvalid because the map has not been refreshed within a certain periodof time or an expiration date associated with the particular map haspassed. In one example, a particular map stored on a mobile device maybe invalidated by a mapping server upon an update to the particular mapmade by the mapping server.

In step 904, the first map is acquired. The first map may be acquiredfrom a mapping server. In one embodiment, a mobile device may transmit arequest for the first map to a mapping server. In another embodiment, amobile device may transmit a request for the first map to a secondmobile device located within a particular environment. In this case, thesecond mobile device may act as a local mapping cache for the mobiledevice. The first map may include one or more image descriptorsassociated with one or more real objects within a particularenvironment. In step 906, the first map is stored. The first map may bestored in a non-volatile memory within a mobile device.

In step 908, a first pose associated with the mobile device in relationto the first map is determined. The pose estimation or determination maybe performed on a mobile device or on a mapping server. In oneembodiment, localization is performed on a mobile device using theprocess of step 634 in FIG. 5. When performing localization on themobile device, image descriptors associated with non-static objects maybe discarded. In another embodiment, a mobile device may transmit imagesof its environment to a mapping server and the mapping server mayperform the localization process on behalf of the mobile device.

In step 910, it is determined whether a second map is required. In oneexample, a mobile device may determine whether a second map is requiredfor rendering and/or display purposes by determining whether a virtualobject is located within (or near) a field of view associated with thefirst pose determined in step 908. The second map may comprise a higherresolution map of a portion of the first map. In step 912, the secondmap is acquired. In step 914, the second map is stored. In step 916, avirtual object is registered in relation to the second map. In oneexample, a mobile device registers the virtual object in relation to thesecond map prior to rendering the virtual object.

In step 918, the virtual object is rendered. The rendering step mayinclude receiving virtual data associated with the virtual object from amapping server and generating one or more virtual images of the virtualobject based on the virtual data. In step 920, the virtual object isdisplayed. The virtual object may be displayed by displaying on a mobiledevice one or more virtual images associated with the virtual object.The one or more virtual images may correspond with a view of the virtualobject such that the virtual object is perceived to exist within thefield of view associated with the first pose determined in step 908.

The disclosed technology may be used with various computing systems.FIGS. 8-10 provide examples of various computing systems that can beused to implement embodiments of the disclosed technology.

FIG. 8 is a block diagram of an embodiment of a gaming and media system7201. Console 7203 has a central processing unit (CPU) 7200, and amemory controller 7202 that facilitates processor access to varioustypes of memory, including a flash Read Only Memory (ROM) 7204, a RandomAccess Memory (RAM) 7206, a hard disk drive 7208, and portable mediadrive 7107. In one implementation, CPU 7200 includes a level 1 cache7210 and a level 2 cache 7212, to temporarily store data and hencereduce the number of memory access cycles made to the hard drive 7208,thereby improving processing speed and throughput.

CPU 7200, memory controller 7202, and various memory devices areinterconnected via one or more buses (not shown). The one or more busesmight include one or more of serial and parallel buses, a memory bus, aperipheral bus, and a processor or local bus, using any of a variety ofbus architectures. By way of example, such architectures can include anIndustry Standard Architecture (ISA) bus, a Micro Channel Architecture(MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics StandardsAssociation (VESA) local bus, and a Peripheral Component Interconnects(PCI) bus.

In one implementation, CPU 7200, memory controller 7202, ROM 7204, andRAM 7206 are integrated onto a common module 7214. In thisimplementation, ROM 7204 is configured as a flash ROM that is connectedto memory controller 7202 via a PCI bus and a ROM bus (neither of whichare shown). RAM 7206 is configured as multiple Double Data RateSynchronous Dynamic RAM (DDR SDRAM) modules that are independentlycontrolled by memory controller 7202 via separate buses (not shown).Hard disk drive 7208 and portable media drive 7107 are shown connectedto the memory controller 7202 via the PCI bus and an AT Attachment (ATA)bus 7216. However, in other implementations, dedicated data busstructures of different types may also be applied in the alternative.

A three-dimensional graphics processing unit 7220 and a video encoder7222 form a video processing pipeline for high speed and high resolution(e.g., High Definition) graphics processing. Data are carried fromgraphics processing unit 7220 to video encoder 7222 via a digital videobus (not shown). An audio processing unit 7224 and an audio codec(coder/decoder) 7226 form a corresponding audio processing pipeline formulti-channel audio processing of various digital audio formats. Audiodata are carried between audio processing unit 7224 and audio codec 7226via a communication link (not shown). The video and audio processingpipelines output data to an A/V (audio/video) port 7228 for transmissionto a television or other display. In the illustrated implementation,video and audio processing components 7220-7228 are mounted on module7214.

FIG. 8 shows module 7214 including a USB host controller 7230 and anetwork interface 7232. USB host controller 7230 is in communicationwith CPU 7200 and memory controller 7202 via a bus (not shown) andserves as host for peripheral controllers 7205(1)-7205(4). Networkinterface 7232 provides access to a network (e.g., Internet, homenetwork, etc.) and may be any of a wide variety of various wire orwireless interface components including an Ethernet card, a modem, awireless access card, a Bluetooth® module, a cable modem, and the like.

In the implementation depicted in FIG. 8, console 7203 includes acontroller support subassembly 7240 for supporting four controllers7205(1)-7205(4). The controller support subassembly 7240 includes anyhardware and software components needed to support wired and wirelessoperation with an external control device, such as for example, a mediaand game controller. A front panel I/O subassembly 7242 supports themultiple functionalities of power button 7213, the eject button 7215, aswell as any LEDs (light emitting diodes) or other indicators exposed onthe outer surface of console 7203. Subassemblies 7240 and 7242 are incommunication with module 7214 via one or more cable assemblies 7244. Inother implementations, console 7203 can include additional controllersubassemblies. The illustrated implementation also shows an optical I/Ointerface 7235 that is configured to send and receive signals (e.g.,from remote control 7290) that can be communicated to module 7214.

MUs 7241(1) and 7241(2) are illustrated as being connectable to MU ports“A” 7231(1) and “B” 7231(2) respectively. Additional MUs (e.g., MUs7241(3)-7241(6)) are illustrated as being connectable to controllers7205(1) and 7205(3), i.e., two MUs for each controller. Controllers7205(2) and 7205(4) can also be configured to receive MUs (not shown).Each MU 7241 offers additional storage on which games, game parameters,and other data may be stored. Additional memory devices, such asportable USB devices, can be used in place of the MUs. In someimplementations, the other data can include any of a digital gamecomponent, an executable gaming application, an instruction set forexpanding a gaming application, and a media file. When inserted intoconsole 7203 or a controller, MU 7241 can be accessed by memorycontroller 7202. A system power supply module 7250 provides power to thecomponents of gaming system 7201. A fan 7252 cools the circuitry withinconsole 7203.

An application 7260 comprising machine instructions is stored on harddisk drive 7208. When console 7203 is powered on, various portions ofapplication 7260 are loaded into RAM 7206, and/or caches 7210 and 7212,for execution on CPU 7200. Other applications may also be stored on harddisk drive 7208 for execution on CPU 7200.

Gaming and media system 7201 may be operated as a standalone system bysimply connecting the system to a monitor, a television, a videoprojector, or other display device. In this standalone mode, gaming andmedia system 7201 enables one or more players to play games or enjoydigital media (e.g., by watching movies or listening to music). However,with the integration of broadband connectivity made available throughnetwork interface 7232, gaming and media system 7201 may further beoperated as a participant in a larger network gaming community.

FIG. 9 is a block diagram of one embodiment of a mobile device 8300.Mobile devices may include laptop computers, pocket computers, mobilephones, personal digital assistants, and handheld media devices thathave been integrated with wireless receiver/transmitter technology.

Mobile device 8300 includes one or more processors 8312 and memory 8310.Memory 8310 includes applications 8330 and non-volatile storage 8340.Memory 8310 can be any variety of memory storage media types, includingnon-volatile and volatile memory. A mobile device operating systemhandles the different operations of the mobile device 8300 and maycontain user interfaces for operations, such as placing and receivingphone calls, text messaging, checking voicemail, and the like. Theapplications 8330 can be any assortment of programs, such as a cameraapplication for photos and/or videos, an address book, a calendarapplication, a media player, an internet browser, games, an alarmapplication, and other applications. The non-volatile storage component8340 in memory 8310 may contain data such as music, photos, contactdata, scheduling data, and other files.

The one or more processors 8312 also communicates with RFtransmitter/receiver 8306 which in turn is coupled to an antenna 8302,with infrared transmitter/receiver 8308, with global positioning service(GPS) receiver 8365, and with movement/orientation sensor 8314 which mayinclude an accelerometer and/or magnetometer. RF transmitter/receiver8308 may enable wireless communication via various wireless technologystandards such as Bluetooth® or the IEEE 802.11 standards.Accelerometers have been incorporated into mobile devices to enableapplications such as intelligent user interface applications that letusers input commands through gestures, and orientation applicationswhich can automatically change the display from portrait to landscapewhen the mobile device is rotated. An accelerometer can be provided,e.g., by a micro-electromechanical system (MEMS) which is a tinymechanical device (of micrometer dimensions) built onto a semiconductorchip. Acceleration direction, as well as orientation, vibration, andshock can be sensed. The one or more processors 8312 further communicatewith a ringer/vibrator 8316, a user interface keypad/screen 8318, aspeaker 8320, a microphone 8322, a camera 8324, a light sensor 8326, anda temperature sensor 8328. The user interface keypad/screen may includea touch-sensitive screen display.

The one or more processors 8312 controls transmission and reception ofwireless signals. During a transmission mode, the one or more processors8312 provide voice signals from microphone 8322, or other data signals,to the RF transmitter/receiver 8306. The transmitter/receiver 8306transmits the signals through the antenna 8302. The ringer/vibrator 8316is used to signal an incoming call, text message, calendar reminder,alarm clock reminder, or other notification to the user. During areceiving mode, the RF transmitter/receiver 8306 receives a voice signalor data signal from a remote station through the antenna 8302. Areceived voice signal is provided to the speaker 8320 while otherreceived data signals are processed appropriately.

Additionally, a physical connector 8388 may be used to connect themobile device 8300 to an external power source, such as an AC adapter orpowered docking station, in order to recharge battery 8304. The physicalconnector 8388 may also be used as a data connection to an externalcomputing device. The data connection allows for operations such assynchronizing mobile device data with the computing data on anotherdevice.

FIG. 10 is a block diagram of an embodiment of a computing systemenvironment 2200. Computing system environment 2200 includes a generalpurpose computing device in the form of a computer 2210. Components ofcomputer 2210 may include, but are not limited to, a processing unit2220, a system memory 2230, and a system bus 2221 that couples varioussystem components including the system memory 2230 to the processingunit 2220. The system bus 2221 may be any of several types of busstructures including a memory bus, a peripheral bus, and a local bususing any of a variety of bus architectures. By way of example, and notlimitation, such architectures include Industry Standard Architecture(ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA)bus, Video Electronics Standards Association (VESA) local bus, andPeripheral Component Interconnect (PCI) bus.

Computer 2210 typically includes a variety of computer readable media.Computer readable media can be any available media that can be accessedby computer 2210 and includes both volatile and nonvolatile media,removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage media.Computer storage media includes both volatile and nonvolatile, removableand non-removable media implemented in any method or technology forstorage of information such as computer readable instructions, datastructures, program modules or other data. Computer storage mediaincludes, but is not limited to, RAM, ROM, EEPROM, flash memory or othermemory technology, CD-ROM, digital versatile disks (DVD) or otheroptical disk storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can accessed bycomputer 2210. Combinations of the any of the above should also beincluded within the scope of computer readable media.

The system memory 2230 includes computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) 2231and random access memory (RAM) 2232. A basic input/output system 2233(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 2210, such as during start-up, istypically stored in ROM 2231. RAM 2232 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 2220. By way of example, and notlimitation, FIG. 10 illustrates operating system 2234, applicationprograms 2235, other program modules 2236, and program data 2237.

The computer 2210 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 10 illustrates a hard disk drive 2241 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 2251that reads from or writes to a removable, nonvolatile magnetic disk2252, and an optical disk drive 2255 that reads from or writes to aremovable, nonvolatile optical disk 2256 such as a CD ROM or otheroptical media. Other removable/non-removable, volatile/nonvolatilecomputer storage media that can be used in the exemplary operatingenvironment include, but are not limited to, magnetic tape cassettes,flash memory cards, digital versatile disks, digital video tape, solidstate RAM, solid state ROM, and the like. The hard disk drive 2241 istypically connected to the system bus 2221 through an non-removablememory interface such as interface 2240, and magnetic disk drive 2251and optical disk drive 2255 are typically connected to the system bus2221 by a removable memory interface, such as interface 2250.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 10, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 2210. In FIG. 10, for example, hard disk drive 2241 isillustrated as storing operating system 2244, application programs 2245,other program modules 2246, and program data 2247. Note that thesecomponents can either be the same as or different from operating system2234, application programs 2235, other program modules 2236, and programdata 2237. Operating system 2244, application programs 2245, otherprogram modules 2246, and program data 2247 are given different numbershere to illustrate that, at a minimum, they are different copies. A usermay enter commands and information into computer 2210 through inputdevices such as a keyboard 2262 and pointing device 2261, commonlyreferred to as a mouse, trackball, or touch pad. Other input devices(not shown) may include a microphone, joystick, game pad, satellitedish, scanner, or the like. These and other input devices are oftenconnected to the processing unit 2220 through a user input interface2260 that is coupled to the system bus, but may be connected by otherinterface and bus structures, such as a parallel port, game port or auniversal serial bus (USB). A monitor 2291 or other type of displaydevice is also connected to the system bus 2221 via an interface, suchas a video interface 2290. In addition to the monitor, computers mayalso include other peripheral output devices such as speakers 2297 andprinter 2296, which may be connected through an output peripheralinterface 2295.

The computer 2210 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer2280. The remote computer 2280 may be a personal computer, a server, arouter, a network PC, a peer device or other common network node, andtypically includes many or all of the elements described above relativeto the computer 2210, although only a memory storage device 2281 hasbeen illustrated in FIG. 10. The logical connections depicted in FIG. 10include a local area network (LAN) 2271 and a wide area network (WAN)2273, but may also include other networks. Such networking environmentsare commonplace in offices, enterprise-wide computer networks, intranetsand the Internet.

When used in a LAN networking environment, the computer 2210 isconnected to the LAN 2271 through a network interface or adapter 2270.When used in a WAN networking environment, the computer 2210 typicallyincludes a modem 2272 or other means for establishing communicationsover the WAN 2273, such as the Internet. The modem 2272, which may beinternal or external, may be connected to the system bus 2221 via theuser input interface 2260, or other appropriate mechanism. In anetworked environment, program modules depicted relative to the computer2210, or portions thereof, may be stored in the remote memory storagedevice. By way of example, and not limitation, FIG. 10 illustratesremote application programs 2285 as residing on memory device 2281. Itwill be appreciated that the network connections shown are exemplary andother means of establishing a communications link between the computersmay be used.

The disclosed technology is operational with numerous other generalpurpose or special purpose computing system environments orconfigurations. Examples of well-known computing systems, environments,and/or configurations that may be suitable for use with the technologyinclude, but are not limited to, personal computers, server computers,hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, set top boxes, programmable consumerelectronics, network PCs, minicomputers, mainframe computers,distributed computing environments that include any of the above systemsor devices, and the like.

The disclosed technology may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, software and program modules asdescribed herein include routines, programs, objects, components, datastructures, and other types of structures that perform particular tasksor implement particular abstract data types. Hardware or combinations ofhardware and software may be substituted for software modules asdescribed herein.

The disclosed technology may also be practiced in distributed computingenvironments where tasks are performed by remote processing devices thatare linked through a communications network. In a distributed computingenvironment, program modules may be located in both local and remotecomputer storage media including memory storage devices.

For purposes of this document, reference in the specification to “anembodiment,” “one embodiment,” “some embodiments,” or “anotherembodiment” are used to described different embodiments and do notnecessarily refer to the same embodiment.

For purposes of this document, a connection can be a direct connectionor an indirect connection (e.g., via another part).

For purposes of this document, the term “set” of objects, refers to a“set” of one or more of the objects.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

1. A method for providing an augmented reality environment on a mobile device, comprising: determining whether a localization map is required, the step of determining whether a localization map is required is performed on the mobile device and includes acquiring a first location associated with the mobile device, the localization map is associated with a first environment; acquiring the localization map, the localization map includes one or more image descriptors, the one or more image descriptors are associated with one or more real objects within the first environment; storing the localization map on the mobile device; determining a first pose associated with the mobile device, the step of determining a first pose is performed on the mobile device and includes receiving one or more images, the one or more images comprise a field of view associated with the first pose, the step of determining a first pose includes detecting at least one of the one or more image descriptors within the one or more images; determining whether a rendering map is required, the step of determining whether a rendering map is required includes determining whether a virtual object is located within the field of view associated with the first pose, the rendering map is associated with at least a portion of the first environment, the rendering map has a higher resolution than the localization map; acquiring the rendering map; storing the rendering map on the mobile device; rendering the virtual object, the step of rendering includes registering the virtual object in relation to the rendering map; and displaying on the mobile device a virtual image associated with the virtual object, the virtual image corresponds with a view of the virtual object such that the virtual object is perceived to exist within the field of view associated with the first pose.
 2. The method of claim 1, wherein: the step of rendering is performed on the mobile device, the step of rendering includes receiving virtual data associated with the virtual object from a mapping server, the step of rendering includes generating the virtual image of the virtual object based on the virtual data, the step of rendering includes generating virtual sounds associated with the virtual object.
 3. The method of claim 1, wherein: the step of determining whether a localization map is required includes determining whether another map stored on the mobile device is valid and associated with the first environment; the step of acquiring a first location includes acquiring the first location from a GPS receiver associated with the mobile device; each of the one or more real objects are characterized as one of static objects or semi-static objects; the localization map includes a sparse 3-D map of the first environment; the rendering map includes a dense 3-D map of the first environment; and the mobile device includes a head mounted display device.
 4. The method of claim 1, wherein: the step of determining a first pose includes determining a first position and a first orientation of the mobile device in relation to the localization map, the step of determining a first pose includes discarding each of the one or more image descriptors that are associated with semi-static objects.
 5. The method of claim 1, wherein: the step of acquiring the localization map includes transmitting a request for the localization map to a mapping server.
 6. The method of claim 1, wherein: the step of acquiring the localization map includes transmitting a request for the localization map to a second mobile device, the second mobile device is located within the first environment.
 7. The method of claim 1, wherein: the step of determining whether a virtual object is within the field of view associated with the first pose is performed on the mobile device.
 8. The method of claim 1, wherein: the step of acquiring the localization map includes transmitting the first location associated with the mobile device.
 9. An electronic device for providing an augmented reality environment, comprising: one or more processors, the one or more processors determine whether a localization map is required, the one or more processors request the localization map from a mapping server; a network interface, the network interface receives the localization map, the localization map includes one or more image descriptors, the one or more image descriptors are associated with one or more real objects within a first environment, the one or more processors determine a first pose associated with a field of view of the electronic device, the network interface receives virtual data associated with the virtual object, the one or more processors determine whether a rendering map is required based on the location of the virtual object within the field of view, the network interface receives the rendering map, the rendering map has a higher resolution than the localization map; a memory, the memory stores the rendering map, the one or more processors render the virtual object in relation to the rendering map; and a display, the display displays a virtual image associated with the virtual object, the virtual image corresponds with a view of the virtual object such that the virtual object is perceived to exist within the field of view.
 10. The electronic device of claim 9, wherein: the electronic device is a mobile device; and the network interface receives virtual data from a mapping server.
 11. The electronic device of claim 9, wherein: each of the one or more real objects are characterized as one of static objects or semi-static objects; the localization map includes a sparse 3-D map of the first environment; the rendering map includes a dense 3-D map of the first environment; and the electronic device includes a head mounted display device.
 12. The electronic device of claim 9, wherein: the determination of the first pose includes determining a first position and a first orientation of the electronic device in relation to the localization map.
 13. The electronic device of claim 9, wherein: the network interface receives the localization map from a mapping server.
 14. The electronic device of claim 9, wherein: the network interface receives the localization map from a second electronic device, the second electronic device is a mobile device.
 15. One or more storage devices containing processor readable code for programming one or more processors to perform a method comprising the steps of: determining whether a localization map is required, the step of determining whether a localization map is required is performed on a mobile device and includes acquiring a first location associated with the mobile device, the localization map is associated with a first environment, the step of determining whether a localization map is required includes determining whether another map stored on the mobile device is valid and associated with the first environment; acquiring the localization map, the localization map includes one or more image descriptors, the one or more image descriptors are associated with one or more real objects within the first environment; storing the localization map on the mobile device; determining a first pose associated with the mobile device, the step of determining a first pose is performed on the mobile device and includes receiving one or more images, the one or more images comprise a field of view associated with the first pose, the step of determining a first pose includes detecting at least one of the one or more image descriptors within the one or more images; determining whether a rendering map is required, the step of determining whether a rendering map is required includes determining whether a virtual object is located within the field of view associated with the first pose, the step of determining whether a virtual object is within the field of view associated with the first pose is performed on the mobile device, the rendering map is associated with at least a portion of the first environment, the rendering map has a higher resolution than the localization map; acquiring the rendering map; storing the rendering map on the mobile device; rendering the virtual object, the step of rendering includes registering the virtual object in relation to the rendering map; and playing on the mobile device a virtual sound associated with the virtual object, the virtual sound is perceived to originate from a location associated with the virtual object.
 16. The one or more storage devices of claim 15, wherein: the step of rendering is performed on the mobile device, the step of rendering includes receiving virtual data associated with the virtual object from a mapping server.
 17. The one or more storage devices of claim 16, wherein: the step of acquiring a first location includes acquiring the first location from a GPS receiver associated with the mobile device; the localization map includes a sparse 3-D map of the first environment; the rendering map includes a dense 3-D map of the first environment; and the mobile device includes a head mounted display device.
 18. The one or more storage devices of claim 16, wherein: the step of determining a first pose includes determining a first position and a first orientation of the mobile device in relation to the localization map, the step of determining a first pose includes discarding each of the one or more image descriptors that are characterized as non-static objects.
 19. The one or more storage devices of claim 18, wherein: the step of acquiring the localization map includes transmitting a request for the localization map to a mapping server.
 20. The one or more storage devices of claim 18, wherein: the step of acquiring the localization map includes transmitting a request for the localization map to a second mobile device, the second mobile device is located within the first environment. 